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Abstract 

In this work we consider the image matching problem for two grayscale n x n images, M\ 
and Mi (where pixel values range from to 1). Our goal is to find an affine transformation T 
that maps pixels from Mi to pixels in M 2 so that the differences over pixels p between Mi (p) 
and M2(T(p)) is minimized. Our focus here is on sublinear algorithms that give an approximate 
result for this problem, that is, we wish to perform this task while querying as few pixels from 
both images as possible, and give a transformation that comes close to minimizing the difference. 

We give an algorithm for the image matching problem that returns a transformation T which 
minimizes the sum of differences (normalized by n 2 ) up to an additive error of e and performs 
0(n/e 2 ) queries. We give a corresponding lower bound of O(n) queries showing that this is the 
best possible result in the general case (with respect to n and up to low order terms). 

In addition, we give a significantly better algorithm for a natural family of images, namely, 
smooth images. We consider an image smooth when the total difference between neighboring 
pixels is 0(n). For such images we provide an approximation of the distance between the images 
to within an additive error of e using a number of queries depending polynomially on 1/e and 
not on n. To do this we first consider the image matching problem for 2 and 3-dimensional 
binary images, and then reduce the grayscale image matching problem to the 3-dimensional 
binary case. 



1 Introduction 



Similarity plays a central part in perception and categorization of visual stimuli. It is no wonder that 
similarity has been intensely studied, among others, by cognitive psychologists [161 E] and computer 
vision and pattern recognition researchers. Much of the work on computer vision, including that on 
image matching, involves algorithms that require a significant amount of processing time, whereas 
many of the uses of these algorithms would typically require real-time performance. 

A motivating example is that of image registration \19\ [TS] . Here we are given two images of a 
particular scene or object (e.g., two pictures taken from a video sequence) and wish to match one 
image to the other, for tasks such as motion detection, extraction of 3-dimensional information, 
noise-reduction and super-resolution. Many advances were made in dealing with this task and it 
can now be performed in a wide variety of situations. However, image registration algorithms are 
generally time consuming. 

Image registration is an application of a more abstract computational problem - the image 
matching problem [8j [11] . In this problem we are given two digital n x n images Mi and M2 
and wish to find a transformation that changes Mi so that it best resembles M2. In this work we 
consider the distance between two nxn images Mi and M2 when we perform affine transformations 
on their pixels. Namely, given an affine transformation T, we sum over all pixels p the absolute value 
of the difference between M\(p) and M2(T(p)), where the difference is considered to be 1 for pixels 
mapped outside M%. The distance between Mi and M2 is defined as the minimum such distance 
taken over all affine transformations. Our focus is on affine transformations as such transformations 
are often used when considering similarity between images. We limit ourselves to trasformations 
with a bounded scaling factor. This is congruent with applications, and prevents situations such 
as one image mapping to very few pixels in another. Exact algorithms for this problem generally 
enumerate all possible different transformations, fully checking how well each transformation fits 
the images. Hundt and Liskiewicz [HJ give such an algorithm for the set of affine transformations 
on images with nxn pixels (transformations on which we focus in this paper), that runs in time 
0(ra 18 ). They also prove a lower bound of Jl(n 12 ) on the number of such transformations (which 
implies a f2(n 12 ) lower bound on algorithms using this technique). 

As known exact algorithms have prohibitive running times, image registration algorithms used 
in practice are typically heuristic. These algorithms often reduce the complexity of the problem by 
roughly matching "feature points" |X9^ HH] - points in the images that have relatively distinct char- 
acteristics. Such heuristic algorithms are not analyzed rigorously, but rather evaluated empirically. 

Two related problems are those of shape matching and of point set matching, or point pattern 
matching. In shape matching |18j the goal is to find a mapping T between two planar shapes Si 
and 1S2, minimizing a variety of distance measures between the shapes T(S\) and £2. A problem 
of similar flavor is that of point set matching [5J , where we are given two (finite) sets of points A 
and B in a Euclidean space and seek to map A to a set T(A) that minimizes the distance between 
T(A) and B under some distance metric. Algorithms for these exact problems were give by Alt 
et al' [2] and by Chew et al' [lj both require prohibitive running times. Recent research [3] has 
focused on finding transformations that are close to the optimal one requiring less time. It should 
be noted that the running times the algorithms in [3] are superlinear in the number of points in 
A and B. We emphasize that these works are concerned with planar shapes and point sets rather 
than digital images. 

Our main contribution is devising sublinear algorithms for the image matching problem. Sub- 
linear algorithms are extremely fast (and typically randomized) algorithms that use techniques 
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such as random sampling to asses properties of objects with arbitrarily small error. The number 
of queries made by such algorithms is sublinear in the input size, and generally depends on the 
error parameter. The use of sublinear algorithms in image processing was advocated by Rashkod- 
nikova [13] who pioneered their study for visual properties. She gave algorithms for binary (0 — 1) 
images, testing the properties of connectivity, convexity and being a half-plane. In her work, an 
image is considered far from having such a property if it has a large hamming distance from every 
image with the property. Ron and Tsur [15J introduced a different model that allowed testing of 
sparse images (where there are o(n 2 ) different 1-pixels) for similar properties. Kleiner et al. |10| 
give results on testing images for a partitioning that roughly respects a certain template. Unlike the 
aforementioned works we do not deal only with binary images, but also consider grayscale images, 
where every pixel gets a value in the range [0, 1]. 

1.1 Our Results 

In this work we prove both general results and results for smooth images. 

1. General Upper Bound: We present an algorithm that when given access to any two n x n 
grayscale images M\ and M2 and a precision parameter e returns a transformation T such 
that the distance between Mi and M2 using T is at most e greater than the minimum distance 
between them (taken over all affine transformations) . The query complexity of this algorithm 
is @(n/e 2 ), which is sublinear in n 2 , the size of the matrices. 

2. Lower Bound: We show that every algorithm estimating matching between images within 
additive error smaller than 1/4 must make an expected 0,(n) number of queries. 

3. Upper Bound For Smooth Images: We show that if the images Mi and M2 are smooth, that 
is, for both images the total difference between neighboring pixels is 0(n), then for every 
positive e we can find a transformation T such that the distance between Mi and M2 using 
T is at most e greater than the minimum distance between them. This can be done using a 
number of queries that is polynomial in 1/e and does not depend on n. 

Being smooth is a property of many natural images - research has shown a power- law distribution 
of spatial frequencies in images \17\ [T2] , translating to very few fast changes in pixel intensity. While 
we show that our algorithm works well with smooth images, we note that distinguishing between 
images that have a total difference between neighboring pixels of 0(n) and those with a total of 
0(n) + k requires f2(n 2 /k) queries. 

An unusual property of the way distance between images is defined in this work is that it is 
not symmetric. In fact, an image Mi may have a mapping that maps all its pixels to only half the 
pixels in M2, so that each pixels is mapped to a pixels with the same value, while any mapping 
from M2 to Mi leaves a constant fraction of the pixels in M2 mapped either outside Mi or to pixels 
that do not have the same color (To see this consider an image Mi that has only black points, and 
an image M2 that is black on one the left side and white on the other). We note that one can use 
the algorithms presented here also to measure symmetric types of distances by considering inverse 
mappings. 

Techniques 



3 



The Algorithm for the General Case: Imagine that sampling a pair of pixels, p G M\ 
and q G M2, would let us know how well each affine transformation T did with respect to the 
pixel p, that is, what the difference is between M\{p) and M2(T(p)). The way we define grayscale 
values (as ranging from to 1), we could sample 0(e 2 ) random pairs of points and have, for every 
transformation, an approximation of the average difference between points up to an additive error 
of e with constant probability. As there are polynomially many different affine transformations 
if we increased the number of samples to amplify the probability of correctness, we could use 
(5(log(n)/e 2 ) queries and return a transformation that was e-close to the bestj^] However, when we 
sample p G M\ and q G Mi uniformly at random we get a random pixel and its image under only a 
few of the different transformations. We show that 0(n/e 2 ) queries suffice to get a good estimation 
of the error for all interesting transformations (that is, transformations that map a sufficiently large 
portion of pixels from Mi to pixels in M2). Using these pixels we can return a transformation that 
is close to optimal as required. 

The Lower Bound: We prove the lower bound by giving two distributions of pairs of images. 
In the first, the images are random — 1 images and far from each other. In the second, one image 
is partially created from a translation of the other. We show that any algorithm distinguishing 
between these families must perform Q(n) expected queries. The proof of the lower bound is 
somewhat similar to the lower bound given by Batu et al. [3] on the number of queries required to 
approximate edit distance. Here we have a two-dimensional version of roughly the same argument. 
Note that a random — 1 image is far from being smooth, that is, many pixels have a value 
significantly different from that of their neighbors. 

The Algorithm For Smooth Images: Our analysis of the algorithm for smooth images 
begins by considering binary images. The boundary of a — 1 image M is the set of pixels that have 
a neighboring pixel with a different value. We consider two affine transformations T, T 1 close if for 
every pixel p the distance in the plane between T(p) and T'{p) is small. Only points that are close 
to the boundary might be mapped to different values by close transformations T and T' (meaning 
that the pixel will be mapped correctly by one and not by the other - see Figure 1). It follows 
that if there is a big difference in the distance between M\ and M2 when mapped by T and their 
distance when mapped by T", then the perimeter, the size of the boundary, is large. This implies 
that when the perimeter is small, one can sample a transformation T and know a lot about the 
distance between images for transformations that are "close" to T. This idea can be generalized 
to 3-dimensional binary images. Such 0—1 images are a natural object in 3 dimensions as color is 
not a feature typically attributed to areas within a body. More importantly, however, one can use 
3-dimensional binary images to model 2-dimensional grayscale images. Smooth grayscale images, 
i.e., images where the sum of difference (in absolute value) between neighboring pixels is 0(n), 
translate to 3-dimensional binary images that have a small perimeter. An appropriate version of 
the 3-dimensional algorithm can be used to get a good approximation for the mapping between 
two grayscale images. 

Organization 

We begin by giving some preliminaries in Section [2] We then describe and prove the correctness 
of the algorithm for the general case (with a query complexity of 0(n/e 2 )). We give the lower 

1 The O symbol hides logarithmic factors 
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bound in the next section. Following that we give the algorithm for smooth binary images in 



Section 4.1 In Section 4.2 we give an explicit construction of an e-net of transformations such 



that any transformation is close to one of the those in the net. In Section 4.3 we give the three- 
dimensional version of our algrithm, and in Section 4.4 we show how to use this version to work 
with grayscale images. 



2 Preliminaries 

We are given two images represented by n x n matrices. For grayscale images the values of entries 
in the matrix are in the range [0, 1] and for binary images they are either or 1. 

Definition 2.1 A pixel p in an n x n image M is a pair of coordinates, namely a pair (i,j) G 
{1, . . . ,n} 2 . We denote this as p G M. 

Definition 2.2 The value of a pixel p = (i,j) in an image M is M[i,j], or M{p). 

Definition 2.3 For r G 1Z 2 we denote by \_r\ the pixel p that the point r falls in. 

Definition 2.4 A transformation T has a scaling factor in the range [1/c, c] (for a positive constant 
c) if for all vectors v it holds that \\v \\/c < \\Tv\\ < c||u||. 

Here we are particularly interested in affine transformations in the plane that are used to map 
one pixel to another, when these transformations have a scaling factor in the range [1/c, c] for a fixed 
positive constant c. Such a transformation T can be seen as multiplying the pixel vector by a 2 x 2 
non-singular matrix and adding a "translation" vector, then rounding down the resulting numbers. 
When comparing two images, requiring the matrix to be non-singular prevents the transformation 
from mapping the image plane in one image onto a line or a point in the other. 

Given an affine transformation in the form of a matrix A and a translation vector t, there is 
a corresponding transformation T(p) = [Ap + t\ . We call T an image-affine transformation and 
we say that T is based on A and t. Generally speaking, when we discuss algorithms getting an 
image-affine transformation as input, or enumerating such transformations, we assume that these 
transformations are represented in matrix format. 

Definition 2.5 The distance between two nxn images (Mi, M2) with respect to a transformation 
T, which we denote Ay (Mi, M2), is defined as 



1 

n 2 



\{p G M x I T{p) i M 2 }| + Yl \ M ^P) ~ M 2( T CP))I 

p£Mi\T{p)£M 2 



Note that the distance Ay (Mi, M2) ranges from to 1. 

Definition 2.6 We define the Distance between two images (Mi, M2) (which we denote A(Mi, M2) ) 
as the minimum over all image-affine transformations T of At (Mi, M2). 

Definition 2.7 Two different pixels p = (i,j) and q = (i',j') are adjacent if \i — i'\ < 1 and 

|j -j'l<i- 
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The following definitions relate to binary (0 — 1) images: 

Definition 2.8 A pixel p = (x, y) is a boundary pixel in an image M if there is an adjacent pixel 
q such that M (p) / M (q) . 

Definition 2.9 The perimeter of an image M is the set of boundary pixels in M as well as the 
4n — 4 outermost pixels in the square image. We denote the size of the perimeter of M by Pm- 

Note that Pm is always O(n) and 0(n 2 ). 

3 The General Case 



We now present the algorithm for general images. The lower bound we will prove in Section 3.2 
demonstrates that this algorithm has optimal query complexity, despite having a prohibitive running 
time. The main signficance of the algorithm is in showing that one can achieve query complexity 
of 0{n). It is an open question if one can achieve this query complexity in sublinear time or even 
significantly faster than our running tme. 

3.1 The Algorithm 

Algorithm 1 Input: Oracle access to n x n images M\,M 2 , and a precision parameter e. 

1. Sample k = G(n/e 2 ) pixels V = pi, . . . ,pk uniformly at random (with replacement) from M\. 

2. Sample k pixels Q = qx, . . . , qt uniformly at random (with replacement) from M 2 . 

3. Enumerate all image-affine transformations T\,... ,T m (Recall that m, the number of image- 
affine transformations, is in 0(n 18 )). 

4- For each transformation Ti denote by Outi the number of pixel coordinates that are mapped 
by Ti out of the region [l,n] 2 . 

5. For each transformation Ti denote by Hiti the number of pairs Pi,qj such that Te(pi) = qj, 
and denote by Bad e the value \ {peV , qeQ \T e ( Pl )= q ,}\ Dp < , gj .g{ P gy, g gg|r < ( P< )= gj -} - M 2 (qj)\ 

6. Return Tn that minimizes (n 2 —Outi)-Badi (discarding transformations such that Hite < e). 



Theorem 3.1 With probability at least 2/3 Algorithm [7] returns a transformation T such that 
\A T {M 1 ,M 2 )-A(M 1 ,M 2 )\ <e. 



We prove Theorem 3.1 by showing that for any fixed transformation Tp (where Hiti > e) 
the sample we take from both images gives us a value Badi that is a good approximation of the 

value |{pgA/i,ggM 2 |THp 8 )=g J }l ^Pi,gMp^M l7q eM 2 \T,( Pi )= qj } \Mi(pi)-M 2 (qj)\ with high probability, and 
applying a union bound. To show this we give several definitions and claims. For these we fix an 
image-affine transformation T and two images M\ and M 2 , so that T maps at least e/2 of the 
points in M\ to points in M 2 (note that transformations that do not map such an e/2 portion of 
pixels are discarded by the algorithm with very high probability). 
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1. Let T(M\) be the set of pixels q £ Mi such that there exist pixels p £ Mi so that T(p) = q. 

2. For a set of pixels Q G M 2 let l 1-1 ^) denote the set {p £ M\\T(p) £ Q}. 

3. We denote by Q' the points that are in Q (the sample of points taken from M 2 ) and in T(Mi). 

4. We denote by V the points p £ V such that T(p) G Q' . 

5. For a pixel q £ Mi we denote by \q\ the number of pixels p £ M\ such that T(p) = q. 

6. For a pixel g £ Mi we denote by q the sum over pixels p £ M\ such that T(p) = q of 
|Mi(p)-M 2 (T(p))|. 

7. Denote by pb a d the average over pixels p from those mapped from Mi to T(M\) of |Mi(p) — 
M 2 (T(p))|. 

8. Denote by p bad the value (EgeQ' 9)/(E ge Q' I<?D- 

Claim 3.1 With probability at least l/(8n 18 ) over the choice ofV and Q the size of Q! is Q.{n/e) 
and the size ofV' is ft(log(n) / e 3 ) . 

Proof: The probability of any particular pixel in Q belonging to Q' is at least e/2, and Q is of 
size 9(n/e 2 ) (where pixels are chosen independently). Hence the expected number of points in Q' is 
f2(n/e). An additional factor of G(log(n)) hidden in the O notation of k assures us (using Chernoff 
bounds) that the probability Q' not being large enough is at most l/(8n 18 ) as required. 

Assume the first part of the claim holds. Recall that no more than a constant number of pixels 
from Mi are mapped to any pixel in Mi, and therefore |T _1 (Q')| = Q(n/e). As the pixels of V 
are chosen independently and uniformly at random from the n 2 pixels of M\, each pixel in V is 
mapped to a pixel in Q' with a probability of f2(l/(ne)). Hence, the expected size of V' is f2(l/e 3 ) 
and the second part of the claim follows (via a similar argument). I 

Claim 3.2 With probability at least l/(8n 18 ) over the choice ofV and Q it holds that \pbad~ Pbad\ < 

6/4. 

Proof: Note that p bad equals (Z) ge T(Mi) ^)/(^ q eT(Mi) l?D- Now > consider the value p bad = 
Cl2q£Q> ^)/(S g eQ' M) - E acn pixel q £ Q' is chosen uniformly at random and independently from 
the pixels in T(M\). To see that the claim holds we note that with probability at least 1 — l/(8n 18 ) 
(using Hoeffding bounds and the fact that with high probability V' is r2(log(n)/e 3 )) we have that 
\(E q& Q'Q/\Q'\)HE qmMl )<i/\nMi)\)\ = 0(e) andthatKE^s'kl/lS'D-CE^TCMO^I/IW)!)! 
0(e). The claim follows. I 

Claim 3.3 With probability at least l/(8n 18 ) over the choice ofV and Q it holds that \Bade—pbad\ < 

6/2 

Proof: We have that p bad = jj^yj^f- It follows that p bad equals E peT ^ 1 ^Q,~ ) [Mi(p) = Mi(T(p))} 

where p is chosen uniformly at random from T _1 (Q'). The pixels in V ar e chosen uniformly at 
random from T _1 (Q') and (with sufficiently high probability), by Claim 3.1 there are ft(log(n) / e s ) 
such pixels. The claim follows using Hoeffding bounds. I 

We thus see that Badi is e-close to the average difference between M\{p) and Mi(T^{p)) for 
pixels p £ Mi mapped by Tt to Mi. As Outi is exactly the number of pixels mapped by Tg out of 
M 2 , the claim follows from the definition of distance. 
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3.2 Lower Bound 



We build a lower bound using binay images, and parameterize the lower bound with the image 



perimeter (see Theorem 3.2). Note that for an image M having a perimeter of size k implies that 
the total difference between pixels and their neighbors in M is Q(k). We use a bound on this total 
difference to discuss the smoothness of grayscale images later in this work. 

Theorem 3.2 Fix k > such that k = o(n) (k may depend on n). Let A be an algorithm that is 
given access to pairs of images Mi,M 2 where max{PM 1 , Pm 2 ) = Q{n 2 /k). In order to distinguish 
with high probability between the following two cases: 

1. A(M 1 ,M 2 ) < 4/16 

2. A(Mi,M 2 ) > 7/16 

A must perform Q(n/ k) queries. 



In order to prove Theorem 3.2 we will first focus on the case where k = 1. Namely, we 
show that any algorithm that satisfies the conditions as stated in the theorem for all images with 
max{PM 1 , -Pm 2 ) = Q{n 2 ) must perform Q(n) queries. Following this we will explain how the proof 
extends to the case of a general k. 

We use Yao's principle - we give two distributions Pi,X> 2 over pairs of images such that the 
following holds: 

1. Pr (Ml)M2m [A(Mi,M 2 ) > 7/16] > 1 - o(l) 

2. Pr (Afl>Afam [A(Mi,M 2 ) < 4/16] = 1 

and show that any deterministic algorithm that distinguishes with high probability between pairs 
drawn from T>\ and those drawn from D 2 must perform fi(n) expected queries. We now turn to 
describe the distributions. 

The distribution T>\ is the distribution of pairs of images where every pixel in M\ and every pixel 
in M 2 is assigned the value 1 with probability 0.5 and the value otherwise, independently. Pairs 
in the distribution 2? 2 are constructed as follows. M\ is chosen as in T>\. We now choose uniformly 
at random two values Sh, s v ranging each from to n/8. Pixels (i, j) in M 2 where i < Sh or j < s v 
are chosen at random as in T>\. The remaining pixels (i,j) satisfy M 2 (i, j) = M\(i — Sh,j — s v ). 
Intuitively, the image M 2 is created by taking Mi and shifting it both horizontally and vertically, 
and filling in the remaining space with random pixels. 

Both distributions T>\ and 2? 2 possess the required limitation on the size of the boundaries 
(i.e. max(PMx , Pm 2 ) = &(n- 2 )). It suffices to show this with respect to the image Mi, which 
is constructed in the same way in both distributions. It is easy to see (since Mi's pixels are 

2 

independently taken to be or 1 uniformly at random) that PrfP^ > \-\ = 1 — o(l). 

We now proceed to prove the properties of V\ and "D 2 . Starting with D 2 , given the transfor- 
mation T that shifts points by s v and Sh, all but a 15/64'th fraction (which is smaller than 1/4) of 
Mi's area matches exactly with a corresponding area in M 2 and the following claim holds. 

Claim 3.4 Pr (Ml)M2) - 2?2 [A(M 1 ,M 2 ) < 4/16] = 1 

In order to prove that pairs of images drawn from T>\ typically have a distance of at least 7/16 
we first state the following claim [8]: 
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Claim 3.5 The number of image-affine transformations T between n x n images Mi and M 2 that 
map at least one of M\ 's pixels into M 2 is polynomial in n. 



Claim 3.6 Pr (Ml)M2m [A(M t , M 2 ) > 7/16] > 1 - o(l) 

Proof: Consider two images Mi,M 2 that are sampled from T>\. The value Ar(Mi,M 2 ) for an 
arbitrary transformation T is A T (Mi,M 2 ) < Pr p( z Ml [T(p) G M 2 A Mi(p) = M 2 (T(p))]. For any 
pixel p, over the choice of M 1 and M 2 , it holds that Pr[T(p) G M 2 A Mi(p) = M 2 (T(p))] < 1/2 (if 
G M 2 the probability is 1/2). The random (over the choice of Mi,M 2 ) variable At(Mi,M 2 ) 
has an expectation of at most n 2 /2. As it is bounded by the sum of n 2 independent — 1 random 
variables with this expectation, the probability that At(Mi,M 2 ) < (1/2 — e)n 2 for any positive 
fixed e is B(e _n ). As A(Mx,M 2 ) = miny At (Mi, M 2 ), and as there are at most a polynomial 
number of transformations T, the claim follows using a union bound. I 



The proof of Theorem 3.2 for the case k = 1 is a consequence of the following claim. 

Claim 3.7 Any algorithm that given a pair of n x n images Mi,M 2 acts as follows: 

1. Returns 1 with probability at least 2/3 if A(Mi,M 2 ) < 4/16. 

2. Returns with probability at least 2/3 if A(M 1 ,M 2 ) > 7/16. 
must perform O(n) expected queries. 

Proof: To show this we consider any deterministic algorithm that can distinguish with probability 
greater than 1/2 + e (for a constant e > 0) between the distributions T>\ and P 2 . Assume (toward a 
contradiction) such an algorithm A that performs m = o{n) queries exists. We will show that with 
very high probability over the choice of images, any new query A performs is answered independently 
with probability 0.5 by and with probability 0.5 by 1. This implies the Theorem. 

The fact that any new query A performs is answered in this way is obvious for T>\ - here the 
pixels are indeed chosen uniformly at random. 

We now describe a process P that answers a series of m queries performed by A in a way that 
produces the same distribution of answers to these queries as that produced by pairs of images 
drawn from 2? 2 . This will complete the proof. The process P works as follows (we assume without 
loss of generality that A never queries the same pixel twice) : 

1. Select m bits n, . . . ,r m uniformly and independently at random. These will (typically) serve 
as the answers to ^4's queries. 

2. Select uniformly at random two values Sh, s v ranging each from to n/8. 

3. For qk = - the fc'th pixel queried by A, return the following: 

(a) If qk is queried in M\, and M 2 (i + j + s v ) was sampled, return M 2 (z + Sh,j + s v ). 

(b) If qk is queried in M 2 , and M\(i — Sh,j — s v ) was sampled, return M\{i — Sh,j — s v ). 

(c) Else, return r&. 
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Obviously, P has the same distribution of answers to queries as that of images drawn from 
T>2 - the choice of Sh, s v is exactly as that done when selecting images from T>2, and the values of 
pixels are chosen in a way that respects the constraints in this distribution. We now show that the 
probability of reaching Steps [3a| and [36] is o(l). If P does not reach these steps it returns r\, . . . , r m 
and A sees answers that were selected uniformly at random. Hence, the claim follows. 

Having fixed r%, . . . , r m in Step[TJ consider the queries q[, . . . , q' m that A performs when answered 
rj., ... , r m _i (that is, the query q[ is the first query A performs. If it is answered by r\ it performs 
the query q' 2 , etc.). In fact, we will ignore the image each query is performed in, and consider only 



the set of pixel locations {pk = {ik,jk)}- Step 3a or 36 can only be reached if a pair of pixels pk, 



satisfies — = s v and \ jk —jg\ = Sh- There are 0(m 2 ) such pairs, and as m = o(n) we have that 
m 2 = o(n 2 ). As the number of choices of s v , Sh is in 0(n 2 ), the probability of s v , Sh being selected 
so that such an event occurs is o(l) as required. I 



PROVING Theorem 3.2 for the Case k > 1 (sketch). We construct distributions similar to 
those above, except that instead of considering single pixels we partition each image to roughly 
n 2 /k 2 blocks of size k x k, organized in a grid. The distributions Vi,T>2 are similar, except that 
now we assign the same value to all the pixels in each block. For the distribution T>2 we select 
s v , Sh to shift the image by multiples of k. The remainder of the proof is similar to the case where 
k = 1, but the number of queries that an algorithm must perform decreases from Q(n) to £l(n/k), 
while the boundary size decreases from 0(n 2 ) to Q(n 2 /k). 

4 The Smooth Image Case 

4.1 The Algorithm for Binary Images with Bounded Perimeter 

Given a pair of binary images Mi,M2 with Pm 1 = O(n) and Pm 2 = 0{n) our approach to finding 
an image-affine transformation T such that A^(Mi, M2) < A (Mi, M2) + e is as follows. We 
iterate through a set of image affine transformations that contain a transformation that is close to 
optimal (for all images with a perimeter bounded as above), approximating the quality of every 
transformation in this set. We return the transformation that yields the best result. 



We first show (in Claim 4.1 ) how to approximate the value A^(Mi, M2) given a transformation 
T. We then show that for two affine transformations T, T" that are close in the sense that for 
every point p in the range {1, . . . ,n} 2 the values T(p) and T'{p) are not too far (in Euclidean 
distance), the following holds. For the image affine transformations T,T' based on T and T' the 



values At(M\, M2) and Aj"(M\, M2) are close. This is formalized in Theorem 4.1 and Corollary ??. 
Finally we claim that given a set T of affine transformations such that for every affine transformation 
there exists a transformation close to it in T, it suffices for our purposes to check all image affine 



transformations based on transformations in T ■ In Section 4.2 we give the construction of such a 
set. 

The following claims and proofs are given in terms of approximating the distance (a numer- 
ical quantity) between the images. However, the algorithm is constructive in the sense that it 
finds a transformation that has the same additive approximation bounds as those of the distance 
approximation. 

Claim 4.1 Given images M\ and M2 of size n x n and an image-affine transformation T , let 
d = A T (M 1 ,M 2 ). Algorithm^ returns a value d' such that \d' — d\ < e with probability 2/3 and 
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performs 0(l/e 2 ) queries. 



Algorithm 2 Input: Oracle access to nxn images Mx,Mz, precision parameter e and a transfor- 
mation T (given as a matrix and translation vector). 

1. Sample 0(l/e 2 ) values p 6 M\. Check for eachp whether T(p) 6 M%, and if so check whether 
M 1 (p) = M 2 (T(p)). 

2. Return the proportion of values that match the criteria T{p) G M 2 and M\{p) = M2(T(p)). 



The approximation is correct to within e using an additive Chernoff bound. 
We now define a notion of distance between affine transformations (which relates to points in 
the plane): 

Definition 4.1 Let T and T' be affine transformations. The distance between T and T' is 

defined as max \\T{p) — T' {p)\\2- 
pe[l,n+i) 2 

The notion of distance simply quantifies how far the mapping of a point in an image according 
to T may be from its mapping by T' . Note that this definition doesn't depend on the pixel values 
of the images, but only on the mappings T and T", and on the image dimension n. 



The following fact will be needed for the proof of Theorem 4.1 



Claim 4.2 Given a square subsection M of a binary image and an integer b, let Pm denote the 
number of boundary pixels in M . If M contains at least b 0-pixels and at least b 1-pixels, then 
Pm > Vb. 

Proof: Let M be a square of d x d pixels. Note that d > Vb. To see the claim holds we consider 
three cases. In the first case, all rows and all columns of M contain both and 1 pixels. In such a 
case each row contains at least one boundary pixel, Pm > d> Vb, and we are done. In the second 
case there exists, without loss of generality, a row that does not contain the value 0, and all columns 
contain the value 0. Again this means there are at least d boundary pixels (one for each column), 
Pm > d > Vb, and we are done. Finally, consider the case that there are both rows and columns 
that do not contain the value 0. This means that there is a boundary pixel for each row and for 
each column that do contain the value 0. If there were fewer than Vb boundary pixels this would 
mean there are fewer than Vb rows and columns that contain pixels, and M could not contain b 
different pixels. This would lead to a contradiction, and thus Pm > Vb, and we are done. I 

We now turn to a theorem that leads directly to our main upper-bound results. 

Theorem 4.1 Let M\,~M.2 be n x n images and let 5 be a constant in (0,\/2). Let T and T' be 
image affine transformations based on the affine transformations T,T' , such that l^(T, T') < 5n. 
It holds that 

A T /(Mi, M 2 ) < At (Mi, M 2 ) + o( — h 

\ Ti- 



ll 



Proof: 

The distance A T (M 1 ,M 2 ) = ^ {p G Mi | T(p) <£ M 2 V Mi(p) / M 2 (T(p))} is composed of 
two parts. The first is the portion of pixels from Mi that T maps out of M 2 . The second is the 
portion of pixels in Mi that T maps to pixels that have a different value in M 2 . We will bound 
Ay/ (Mi, M 2 ) — Ay(Mi, M 2 ) . This amounts to bounding the change in the two values mentioned 
above. 

We begin by bounding the number of pixels from Mi that are mapped by T to pixels of M 2 
but aren't mapped to such pixels by T'. As /^(T, T") < 5n, all such pixels are at most <5n-far from 
the outermost pixels of the image. We will bound the number of such pixels by 0(5n 2 ). Since 
Pm 2 > n (f° r it contains all the outermost pixels in M 2 ) and since we normalize by n 2 , these pixels 
contribute 0( M<1 ) as required. We restrict the remaining analysis to pixels that have at least a 
distance of 6n from the outermost pixels in the image. 

The second value can be viewed as the number of new mismatches between Mi and M 2 that 
are introduced, when replacing T by T' (which is not very different from T), and we will discuss 
this change (see Figure 1). Formally, if we denote this amount by misT^T' = |{p € Mi|Mi(p) = 
M 2 (T(p)) / M 2 (T'(p))}\, it would suffice to show that 

misT^T' = 0{5nPM 2 ) 

(the amount of mismatches is normalized by n 2 , the size of the image, in order to get the difference). 
We will bound the amount of new mismatches by breaking the image M 2 into a grid of Sn x 5n 
squares (1/5 such squares on each dimension), showing how the contribution of each square to 
misT^T' depends on its contribution to the perimeter of the image M 2 . For integers i and j, both 
between 1 and 1/5, let bij be the 5n x 5n square located at the ith row and jth column of the 
squares grid defined above. Summing on these squares, we can write: 

1/5 1/5 

mis T ^ T , = Y,Y1 \{P G M i\ T (P) G KjiMiiP) = M 2 (T(p)) + M 2 (T'{p))}\ 

i=l j=l 

We now give several additional definitions: 

• Let mis^ T , = \{p G M x \T(p) G 6jj,Mi(p) = M 2 (T(p)) / M 2 (T'(p))}\ be the contribution 
of bij to misT^T'- This definition implies that: 

Ell S v^l/<5 • i,i 
i=i l^j=i mis^% T , 

— For any i and j, mis^_^ T , is an integer in the range [0, 5 2 n 2 ] 

• Let Bij denote the 35n x 35n square (a block of 3 x 3 original grid squares), containing the 
square bij in its center. 

• Let P]$ 2 be the number of pixels in the perimeter of M 2 that exists within the square Bij 
It obviously holds that: 

<5-n (1) 

s 1 s 1 



^^■EE^S ( 2 ) 
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i=2 j=2 
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Since ££o(T, T') < 5n, each pixel p G Mi that is mapped by T into fry is certainly mapped by 
T 1 into fly. It follows that 

mis^ T , = \{p G Mi|T(p) G by.T'Cp) G fly, Mi (p) = M 2 (T(p)) ^ M 2 (T'(p))}| 

the sum of pixels p G Mi, which are either 0-pixels or 1-pixels. Assume, with out 
loss of generality, that there are more such 0-pixels. These pixels account for at least half the 
amount: 



mis^ T , < 2 ■ \{p G Mi|T(p) G &y,T» G fly,0 = Afi(p) = M 2 (T(p)) ± M 2 {T\p))}\ 

This implies that there are at least ■ mis l r^_^ T , 0-pixels in fry and at least j£r ■ mis^ J _^ T , 
1-pixels in fly where /(c) is a constant depending only on c (since our scaling factors are within 
the range Rr,c], /(c) = 0(c 2 ) pixels from Mi are mapped to the same pixel in M 2 ). In particular, 
the larger square fly contains at least ■ mislf_^ T , 0-pixels as well as at least jAj ■ mis t ^^ tT , 
1-pixels. 



Using Claim 4.2 we can conclude that: 



P M 2 - \l JKS ' ™ s tCt' ( 3 ) 



and using the bounds of equations ([!]) and then ^ and ([3]), we can conclude that: 

I- 1 !- 1 . i/8 

misT^T' = mis % j?_^ T , < 5 ■ n • y mis l ^ J _^ T , < 9^/2 f(c)5nPM 2 

i=2 j=2 i=l j=l 



Definition 4.2 Lei ^4 6e i/ie set of Image-Affine transformations. For a positive a, the set of 
transformations T = {Tj}' =1 is an a-cover of A if for every A in A, there exists some Tj in T, 
such that l^(A,Tj) < a. 

We are going to show that for any given n and 5 > there's a 5n-cover of A with size that 
does not depend on n but only on 5. Using this fact, given two images Mi and M 2 , we will run 
Algorithm [2] on every member of the cover, and get an approximation of A(Mi,M 2 ). In fact, we 
find a transformation T G A that realizes this bound. 

Claim 4.3 Let T = {Ti} e i=1 be a 5n-cover of A and let Mi,M 2 be two n x n images. A transfor- 
mation T G T such that 

r 

|A T (Mi,M 2 ) - A(Mi,M 2 )| < 0(- • max(P Ml ,PM 2 ) + e) 

n 

can be found with high probability using 0(£/e 2 ) queries. 
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Figure 1: Consider two 5n— close transformations between binary images Mi and M2 (the white/gray 
areas in the images correspond to 0/1 pixels). The solid and dotted arrows describe the action of the two 
transformations on the pixels p±, . . . , p&. The areas between the dashed lines in M2 contain the pixels that are 
Sn— close to boundary pixels. The only pixels in Mi that are possibly mapped correctly by one transformation 
but not by the other are those that are mapped into the 'dashed' area by one of the transformations. In this 
example, only p± and p§ are mapped correctly by one, but not by the other. 



Proof: To find such a transformation T we will run Algorithm [2] for m = 0(log£) times on each 
of the I transformations {Tj} with precision parameter e, and set s% as the median of the results 
given for Tj. By the correctness of Claim [ITT] and standard amplification techniques, for each i the 
value S{ will differ from A^Mi, M 2 ) by at most e with probability at least ^ (we will say such a 
value Si is correct). Using a union bound we get that the probability of any Si not being correct is 
at most 1/3. This will bound our probability of error, and from here on we assume all the values 
si are indeed correct and show that we will get a transformation as required. 

We now consider an image affine transformation A such that A(Mi,M2) = Aa(M\, M 2 ). By 
the fact that T is a <5n-cover, there exists a transformation Tj such that /^(^4,Tj) < Sn. Given 
the value A Ti (M 1 ,M 2 ) < A (Mi, M 2 ) + 0(£ • max(P Ml , Pm 2 )) and thus the minimum 



4.1 



Theorem 

value Sj will not exceed A (Mi, M2) + ■ max(PMn Pm 2 ) + e )- Choosing the transformation Tj 
that this value is associated with, we get that 

|A T .(Mi,M 2 ) - A(Mi,M 2 )| < 0(- • max(P Ml ,PM 2 ) + e) 
J n 

as required. ■ 



In section 4.2 we show the existence of a <5n-cover of A whose size is 0(1/(5 ). We can therefore 



conclude with the following corollaries: 

Corollary 4.2 Given images Mi,M 2 and constants 5,e, we have that A(Mi,M 2 ) can be approxi- 
mated, using 0(l/e 2 5 6 ) queries, with an additive error ofO(^ ■ max(PM 1 , Pm 2 ) + e )- 

Corollary 4.3 Given images M\,Mi and constants 5,e such that Pm x = 0(n) and Pm 2 = O(n), 
A(Mi,M 2 ) can be approximated, using 0(l/e 2 5 e ) queries, with an additive error of 0(S + e). 
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4.2 Construction of a 5n-cover of A 



In this section we construct a <5n-cover of A, which will be a product of several 1-dimensional 
and 2-dimensional grids of transformations, each covering one of the constituting components of a 
standard decomposition of Affine transformations |7J, which is given in the following claim. 

Claim 4.4 Every (orientation-preserving) affine transformation matrix A can be decomposed into 
A = TR2SR1, where T,Ri,S are translation, rotation and non-uniform scaling matrices^ 

We now describe a 6-dimensional grid, which we will soon prove to be a 5re-cover of A, as needed. 



According to claim 4.4 every affine transformation can be composed of a rotation, scale, rotation 
and translation. These primitive transformations correspondingly have 1, 2, 1 and 2 degrees of 
freedom. These are: rotation angle, x and y scales, rotation angle and x and y translations. It 
is elementary, for instance, that if we impose a 2-dimensional grid of x and y translations, spaced 
in each direction by an interval of y2Sn, then for any two neighboring translations T\ and T 2 on 
this grid it holds that ^(T^Tg) < Sn. Since the range of possible translations is limited to the 
interval [— n, n], the size of the 2-dimensional grid is 0(l/<5 2 ). Similarly for scaling, we are limited 
to the interval c] and in order to have /^(Si, S 2 ) < Sn for neighboring scalings we use spacings 
of 0(5). Likewise, we cover the 1-dimensional space of rotations, with angles in the interval [0, 2tt] 
with spacings of 0(<5). Finally, by taking the cartesian product of these grids, we end up with a 
single grid of size 0(l/<5 6 ). 

It remains to be shown that the grid we defined above, which we denote by Q, imposes a <5ra-cover 

of A. 

Claim 4.5 For every n, for every 5', there exists a 5'n-cover of A of size 0(l/5' 6 ). 

Proof: Given the grid Q and any image-affine transformation A, if we denote by A' the nearest 
transformation to A on the grid Q, we need to show that ^(A^A 1 ) < Sn. According to claim 
A and A' can be written in the form A = TR 2 SRi and A' = T'R^S'R^, such that /^(T, T"), 



4.4 



Z™ (#1,^), l2o(S,S') and l^ Q (R 2 ,R' 2 ) are all at most Sn. 

We now measure how differently A and A' might act on a pixel p, in order to obtain a bound on 
l^o(A, A'). At each stage we use the triangle inequality, accumulating additional distance introduced 
by each transformation as well as the bounds on the constituting transformations. 



WSR^-S'R^W < \\S(Rip)-S'(R lP )\\ + \\S\Rip)-8'(R! lP )\\ 
= \\(S-S')(R 1 p)\\ + \\S'(R 1 p-R' lP )\\ 
< 5n + c\\Rip - R[p\\ =5n + c5n = (c + l)5n 



\\R2SR1ip) - R , 2 S , R' 1 ( P )\\ < \\R 2 (SR 1P ) - R' 2 (SR 1P )\\ + \\R 2 (SR lP ) - R 2 (S'R' lP )\\ 

= \\(R 2 - R' 2 )(SR lP )\\ + 11^(5^-5^)11 
< 5n + \\SR lP - S'R' lP \\ = (c + 2)<5ra 

2 arguments are similar for orientation-reversing transformations (which include reflection) 
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\\A(p)-A'(p)\\ = WTRzSR^-T'R'zS'R'^ 

< \\T{R 2 SR lP ) - T'(R 2 SR lP )\\ + \\T'{R 2 SR lP ) - T (R' 2 S' R[p)\\ 
= \\(T-T')(R 2 SR lP )\\ + \\T'(R 2 SR lP -R' 2 S'R' lP )\\ 

< Sn + WR-iSRxip) - R! 2 S'R[(p)\\ = (c + 3)5n 

The construction follows by setting 5 = 5' /(c + 3). ■ 
4.3 3-Dimensional Images 

In this section we generalize our techniques and results to 3-dimensional images. One important 
application of the 3-dimensional (0 — 1) setting is to the problem of aligning 3 dimensional solid 
objects, which are represented by 3-dimensional — 1 matrices, where the objects are represented 
by the Is. The other motivation is the need to handle 2-dimensional grayscale images. This is done 



in section 4.4 where our algorithm is based on a reduction from grayscale images to 3-dimensional 
— 1 images. 

In this setting, we are given two images represented bynxnxn — 1 matrices. The image 
entries are indexed by voxels, which are triplets in {1, . . . , n} 3 and the affine transformations in 
the 3-d space act on a voxel by first multiplying it with a non-singular 3x3 matrix A (which 
accounts for rotation and anisotropic scale), then adding a 'translation' vector and finally rounding 
down to get a new voxel. The distance under a fixed affine transformation T between two images 



M\,M 2 is defined in an analogous way to the 2-dimensional case (definition 2.5) and is denoted 
by d = At(Mi, M 2 ). So is the distance between two images with respect to affine transformations 
(that is A(Mi,M2)). Voxels are considered adjacent if they differ in each of their coordinates by 
at most 1 and a voxel is a boundary voxel if it is adjacent to different valued voxels. Finally, the 
P erimeter of the image is the set of its boundary voxels together with its outer 6n 2 — 12n + 8 voxels 
(and it is always Q(n 2 ) and 0(n 3 )). 

Given two images Mi,M 2 and an affine transformation T we can approximate A^(Mi, M 2 ) 
using the same methods as those used in Algorithm [2] The only difference is that we sample voxels 
rather than pixels. Thus we have: 

Claim 4.6 Given 3-dimensional binary images M\ and M 2 of size n x n x n and an image-affine 
transformation T, let d = At(M±, M 2 ). There is an algorithm that returns a value d' such that 
\d' — d\ < e with probability 2/3 and performs 0(l/e 2 ) queries. 



Claim 4.2 generalizes to the following: 



Claim 4.7 Given a cubic subsection M of a binary 3-dimensional image with dimensions hxhxh, 
and an integer b, let Pm denote the number of boundary voxels in M . If M contains at least b 
0-voxels and at least b 1-voxels, then Pm = 0(b 2 ^ 3 ). 

Proof: Assume without loss of generality that there are fewer 0-voxels than 1-voxels. We index 
the voxels in M as M(i,j, k). Let us denote by M(i,j, •) the sum T!l=i M {hj^), and use M(i,-,k) 
and M(-,j, k) in a similar manner. We first note several facts: 

1. h > {2b) 1 / 3 
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2. The number of pairs (i, j) such that M(i,j, •) > is at least b 2 / 3 . This holds because there 
are at least h 3 /2 different 1-voxels. As each pair can account for at most h different 
1-voxels, there must be at least h 2 /2 > b 2 ! 3 such pairs. 

3. Either the number of pairs such that M(i,j,-) < h is at least b 2 ^ 3 , or the number 
of pairs (i, k) such that M(i,-,k) < h is at least 6 2 / 3 , or the number of pairs (j,k) such 
that M(-,j,k) < h is at least b 2 ^ 3 . This follows from the following claim which is a direct 
consequence of Lemma 15.7.5 in Alon and Spencer's bookpQ: 

Claim 4.8 Consider a set Sofb vectors in Si x £2 x S3. Let Sij be the projection of S into 
Si x Sj (where i 7^ j). If bij = \Sij\ then b 2 < J^. bij 

Assume without loss of generality that the number of pairs such that M(i,j, •) < h is at 
least 6 2//3 and recall that the number of pairs such that M(i,j,-) > is at least 6 2 / 3 . We 

consider two cases: 

1. In the first case there are at least 6 2 / 3 /2 pairs of indices such that < M(i,j, •) < h. 
Each such pair surely accounts for at least one boundary pixel, and we are done. 

2. In the second case there are at least b 2 ' 3 /2 pairs of indices such that M(i,j, •) = and 
at least 6 2/3 /2 pairs of indices such that M(i,j, •) = h. In this case one of the following 
will hold: 

(a) There are at least 6 1 / 3 /2 indices i such that there exists an index j such that M(i,j, •) = 
and there are at least 6 1//3 /2 indices i such that there exists an index j such that 
M(i,j,-) = h. 

(b) There are at least 6 1 / 3 /2 indices j such that there exists an index i such that M(i,j, ■) = 
and there are at least 6 1 / 3 /2 indices j such that there exists an index i such that 
M(i,j,-) = h. 



We assume without loss of generality that Case 2a holds. This means that, again, one of two 
cases holds: 

(a) There are more than 6 1//3 /2 indices i such that there are both indices jo and j\ such that 
M{iijo, ■) = and M(i,ji, •) = h. In this case each such index accounts for h boundary 
pixels, and we thus have at least hb 1 ^ 3 /2 > 6 2 / 3 /2 boundary pixels, and we are done. 

(b) Otherwise, there is at least one index iq such that for all j M(io,j, •) = 0, and there is 
least one index i\ such that for all j M(ix,j, •) = h. But this means that for any pair 
of indices (j,k) it holds that M(tQ,j,k) = and M(i\,j,k) = 1 and there must be at 
least one boundary pixel for each such pair (j, k), giving us at least h? > ft 2 / 3 boundary 
pixels and we are done. 



Our "central" theorem 4.1 generalizes to the following: 
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Theorem 4.4 Let Mi, M 2 be nxnx n images and let 5 be a constant in (0, \/3)- Let T and T' be 
image affine transformations based on the affine transformations T,T', such that V^iT, T') < bn. 
It holds that 



(h'(Mi, M 2 ) < d T {M 1 ,M 2 ) + O 



n 



Proof: (Outline of differences from the original proof) 

The square grids b and B are now cubes of edge size 5n and 35n respectively and are parametrized 
by the triplet i, j, k. 

Some of our observations slightly change: 



mistjj'j,, < 5 ■ n (4) 



-i-i i-i i-i 



27 

i=2 j=2 k=2 



Using Claim 4.7 we can conclude that: 



P ^ > (j^ ■ ^T'? /3 (6) 



and using the bounds of equations Q and then ^ and ([6]), we can conclude that: 



I" 1 I" 1 I" 1 1/5 1/5 l/S 

) 2/3 < 27 y/2f{6)SnP M2 

i=2 j=2 k=2 i=l j=l k=l 

■ 

It is straightforward to extend the 2-dimensional case and construct a Sn cover for the set of 
3-dimensional affine transformations where the size of the cover depends only on 5. As in the 
2-dimensional case, the matrix 3x3 matrix A can be decomposed (using SVD decomposition) into 
a product of rotation, scaling and rotation matrices. Together with the final translation vector, we 
get a 5n— cover of size 1/8 10 . The existence of such a cover along with a 3-dimensional analog of 
claim |4.3| implies: 

Corollary 4.5 Given 3-dimensional images Mx,M% and fixed constants 5, e > such that Pm ± = 
0(n 2 ) and Pm 2 = 0{n 2 ), the distance A(Mi, M2) can be approximated, using 0(l/e 2 <5 10 ) queries, 
with an additive error of 0(5 + e). 

4.4 Grayscale Images 

In this section we handle 2-dimensional grayscale images by no longer limiting ourselves to binary 
{0,1} valued pixels but rather allowing a pixel p to have any value M(p) in the interval [0,1]. 
This model covers the commonly practiced discretizations (e.g. to 256 grey levels) of the intensity 
information in a digital image. 

In the following definitions we extend the concept of the perimeter to grayscale images. 
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Definition 4.3 The gradient of a pixel in a grayscale image M is the maximal absolute difference 
between the pixel value and the pixel values of its adjacent pixels. 



Definition 4.4 The perimeter size Pm of a grayscale image M is defined as the sum of its pixels ' 
gradients (where the gradient of each of the An — 4 outermost pixels of the image is counted as 1). 

Notice, that the gradient of a pixel is a real valued number in [0, 1] and that if we consider a 
binary 0-1 image, its boundary pixels are exactly those with gradient one. Also, the perimeter size 
is Q(n) and 0(n 2 ). 

When dealing with binary — 1 images, our similarity measure between images was defined to be 
the maximal similarity between the images with respect to any Afhne transformation on the image 
pixels. In the grayscale extension we would like to introduce further transformations, allowing our 
distance metric to capture (or be invariant to) illumination changes. That is, we would like to 
consider images that differ by a global linear change in pixel values to be similar. Such a linear 
change first multiplies all image pixels values by a 'contrast' factor con and then adds to them a 
'brightness' factor bri. As is custom in the field, pixel values that deviate from the [0, 1] interval as 
a result of such a transformation will be truncated so that they stay within the interval. Also, we 
limit con to the interval [1/c, c] for some positive constant c and therefore bright can be limited to 
[— c, 1] (since con maps a pixel value into the range [0, c]). We denote the family of such intensity 
transformations by BC. 

Definition 4.5 Let T\ and T2 be any two functions from BC. The l 1 ^ distance between T\ and 
T2 is defined as the maximum over pixel values v £ [0,1] 0/ max||Ti(t>) — T2(t>)||2 (which equals 
max|Ti(i;) - T 2 (v)\). 

We can now define the distance between grayscale images under a combination of an affine and 
an intensity transformation. 

Definition 4.6 Let T £ A be an Affine transformation and let L £ BC be and intensity transfor- 
mation. The distance between grayscale images Mi,M2, with respect to T and L is: 

A T>L (M 1 ,M 2 ) = ^Y1 (Mpmh + 1 T{P)^M 2 ■ Wlip) ~ L(M 2 {T(p)))\) 

We can now state our main result: 

Claim 4.9 Given n x n grayscale images Mi,M 2 and positive constants 5 and e, we can find 
transformations T £ T and L £ BC such that with high probability 

\A TiL {M u M 2 ) - A(Mi,M 2 )| < o(- • max(P Ml , Pm 2 ) + e 

using 0(l/e 2 5 s ) queries. 



Proof: We will show Claim 4.9 holds by reducing the problem of approximating the distance 
between two 2-dimensional grayscale images to that of approximating the distance between two 
3-dimensional — 1-images. In particular, we will map an n x re grayscale image M to an n x n x n 
binary image M 1 defined as follows: M'(i,j, k) = 1 if and only if M(i,j) > k/n. 
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This essentially means that a pixel with intensity g is represented by a column of pixels where 
the bottom [gn\ pixels are 1-pixels and the remaining are 0-pixels. The perimeter Pm* of M' is 
0(ra 2 ) + Pm • n. This follows since a gradient of g at a pixel p creates gn boundary pixels in M'. 
Any image-amne transformation T of the grayscale image can be applied to a voxel in M' without 
changing the voxels's third coordinate, that is, we can see the transformation T as mapping between 
columns of voxels. Likewise, intensity transformations L can be seen as applying only to the third 
coordinate of a voxel, that is, mapping pixels to higher or lower locations in their corresponding 
columns and truncating them to n (1) if their value is larger (smaller) than n (1). This truncation 
is equivalent to the truncation of the pixel values to the interval [0, 1] when applying intensity 
transformations on grayscale images. 



We wish to derive a similar result to corollary 4.3 To do this, we consider a slightly different 
metric on 3-dimensional binary images. Namely, we limit the general family of 3-dimensional affine 
transformations to include only transformations that apply a two dimensional affine transformation 
on the first two coordinates as well as scale and translation on the third coordinate (which relate 
to the intensity component of the transformation). Call this family of transformations S. Now we 



can proceed in a similar fashion to corollary 4.3 Denote by M[ and M' 2 the resulting 3-dimensional 
images after applying our reduction on the 2-dimensional grayscale images M\ and M2. It holds 
that A(Mi, M2) = A(M[, Mlf) as there is a one to one correspondence between transformations of 
grayscale images defined by a pair T and L between M\ and M2 and transformations in S between 
M[ and M' 2 . Furthermore, by the way our reduction was defined, such a corresponding pair of 
transformations yield the same distance between both pairs of images. 

We can now proceed along the same reasoning leading to corollary |4.3| Namely, we construct 
a 5n cover for our limited set of 3-dimensional transformations. For the component of the 2- 
dimensional affine transformation we use the same cover used in section 4.2 of size 0(^). For the 
intensity component we use a similar construction by dividing the scale and translation ranges in 
the third coordinate to step sizes of Q(5n) and 0(5) respectively. The resulting cover is of size 
0(^) and it is easily shown to be a valid 5n cover. The assertion now follows in a similar way to 



corrolary 4.3 The only difference is that we consider only the set of restricted transformations S 



rather than the set of all 3-dimensional affine transformations. 
■ 

We conclude with the following corollary: 

Corollary 4.6 Given n x n grayscale images M\,Mi and constants 5,e such that Pm x = O(n) 
and Pm 2 = 0(n), the distance A (Mi, M2) can be approximated, using 0(l/e 2 <5 8 ) queries, with an 
additive error of 0(5 + e). 
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